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Abstract 

In this paperjj] we present a new pars- 
ing algorithm for linear indexed grammars 
(LIGs) in the same spirit as the one de- 
scribed in ( Vijay-Shankcr and Weir, 1993| ) 
for tree adjoining grammars. For a LIG L 
and an input string x of length n, we build 
a non ambiguous context-free grammar 
whose sentences are all (and exclusively) 
valid derivation sequences in L which lead 
to x. We show that this grammar can 
be built in 0(n 6 ) time and that individ- 
ual parses can be extracted in linear time 
with the size of the extracted parse tree. 
Though this 0(n 6 ) upper bound does not 
improve over previous results, the average 
case behaves much better. Moreover, prac- 
tical parsing times can be decreased by 
some statically performed computations. 



1 Introduction 

The class of mildly context-sensitive languages can 
be described by several equivalent grammar types. 
Among these types we can notably cite tree adjoin- 
ing grammars (TAGs) and linear indexed grammars 
(LIGs). In QVijay-Shanker and Weir, 1994| ) TAGs 
are transformed into equivalent LIGs. Though 
context-sensitive linguistic phenomena seem to be 
more naturally expressed in TAG formalism, from 
a computational point of view, many authors think 
that LIGs play a central role and therefore the un- 
derstanding of LIGs and LIG parsing is of impor- 
tance. For example, quoted from ( [Schabes and 



Shicbcr, 1994f) "The LIG version of TAG can be used 
for recognition and parsing. Because the LIG for- 
malism is based on augmented rewriting, the pars- 
ing algorithms can be much simpler to understand 



and easier to modify, and no loss of generality is in- 
curred". In (Vij ay- S hanker and Weir, 1993) LIGs 
are used to express the derivations of a sentence in 



TAGs. In flVijay-Shankcr, Weir and Rambow, 1995 ) 
the approach used for parsing a new formalism, the 
D-Tree Grammars (DTG), is to translate a DTG 
into a Linear Prioritized Multiset Grammar which 
is similar to a LIG but uses multisets in place of 
stacks. 

LIGs can be seen as usual context-free grammars 
(CFGs) upon which constraints are imposed. These 
constraints are expressed by stacks of symbols as- 
sociated with non-terminals. We study parsing of 
LIGs, our goal being to define a structure that ver- 
ifies the LIG constraints and codes all (and exclu- 
sively) parse trees deriving sentences. 

Since derivations in LIGs are constrained CF 
derivations, we can think of a scheme where the 
CF derivations for a given input are expressed by 
a shared forest from which individual parse trees 
which do not satisfied the LIG constraints are 
erased. Unhappily this view is too simplistic, since 
the erasing of individual trees whose parts can be 
shared with other valid trees can only be performed 
after some unfolding (unsharing) that can produced 
a forest whose size is exponential or even unbounded. 



In ( Vij ay- S hanker and Weir, 1993 ), the context - 
freeness of adjunction in TAGs is captured by giving 
a CFG to represent the set of all possible derivation 
sequences. In this paper we study a new parsing 
scheme for LIGs based upon similar principles and 
which, on the other side, emphasizes as ( |Lang, 1991 ) 
and ( Lang, 1994[ ), the use of grammars (shared for- 
est) to represent parse trees and is an extension of 



1 See ( |Boullier, 199€ ) for an extended version. 



our previous work (Boullier, 1995). 

This previous paper describes a recognition algo- 
rithm for LIGs, but not a parser. For a LIG and an 
input string, all valid parse trees are actually coded 
into the CF shared parse forest used by this recog- 
nizer, but, on some parse trees of this forest, the 



checking of the LIG constraints can possibly failed. 
At first sight, there are two conceivable ways to ex- 
tend this recognizer into a parser: 

1. only "good" trees are kept; 

2. the LIG constraints are [re-] checked while the 
extraction of valid trees is performed. 

As explained above, the first solution can produce 
an unbounded number of trees. The second solution 
is also uncomfortable since it necessitates the reeval- 
uation on each tree of the LIG conditions and, doing 
so, we move away from the usual idea that individ- 
ual parse trees can be extracted by a simple walk 
through a structure. 

In this paper, we advocate a third way which will 
use (see sect ion fl|), the sam e basic material as the 
one used in ( Boullier, 1995| ) . For a given LIG L and 
an input string x, we exhibit a non ambiguous CFG 
whose sentences are all possible valid derivation se- 
quences in L which lead to x. We show that this 
CFG can be constructed in 0(n 6 ) time and that in- 
dividual parses can be extracted in time linear with 
the size of the extracted tree. 

2 Derivation Grammar and CF 
Parse Forest 

In a CFG G = (Vn,V t ,P,S), the derives relation 
=> is the set {(aBa',al3a') \ B ^ (3 e P AV = 

G 

Vjsr U Vt A a, a' G V"*}. A derivation is a sequence 
of strings in V* s.t. the relation derives holds be- 
tween any two consecutive strings. In a rightmost 
derivation, at each step, the rightmost non-terminal 
say B is replaced by the right-hand side (RHS) of 
a -B-production. Equivalently if cto =£■ . . . => o n is 

G G 

a rightmost derivation where the relation symbol is 
overlined by the production used at each step, we 
say that r% . . .r n is a rightmost <7o/cr„-derivation. 

For a CFG G, the set of its rightmost S/x- 
derivations, where x £ C(G), can itself be defined 
by a grammar. 

Definition 1 Let G = (V N ,V T ,P,S) be a CFG, 
its rightmost derivation grammar is the CFG D — 
(V N ,P, P D , S) where P D = {A -> A 1 . . . A q r | r = 

Aq — > WqAiWi . . . W q -lA q W q G P A U>i 6 Vj A Aj € 

V N } 

From the natural bijection between P and P D , we 
can easily prove that 

C{D) = {r n ...r 1 | 
ri . . . r n is a rightmost S/ ^-derivation in G} 



This shows that the rightmost derivation language 
of a CFG is also CF. We will show in section Q that 
a similar result holds for LIGs. 



Following (Lang, 1994), CF parsing is the inter- 
section of a CFG and a finite-state automaton (FSA) 
which models the input string aj^|. The result of this 
intersection is a CFG G x = (V N , V^,P- 



called 



a shared parse forest which is a specialization of the 
initial CFG G = (V N ,V T ,P,S) to x. Each produc- 
tion r\ S P x , is the production r, G P up to some 
non-terminal renaming. The non-terminal symbols 
in V§ are triples denoted where A G Vn, and 
p and q are states. When such a non-terminal is 
productive, w, we have q G S(p, w). 

G x 

If we build the rightmost derivation grammar as- 
sociated with a shared parse forest, and we remove 
all its useless symbols, we get a reduced CFG say D x . 
The CF recognition problem for (G, x) is equivalent 
to the existence of an [S] q -production in D x . More- 
over, each rightmost S'/x-derivation in G is (the re- 
verse of) a sentence in C(D X ). However, this result 
is not very interesting since individual parse trees 
can be as easily extracted directly from the parse 
forest. This is due to the fact that in the CF case, a 
tree that is derived (a parse tree) contains all the 
information about its derivation (the sequence of 
rewritings used) and therefore there is no need to 
distinguish between these two notions. Though this 
is not always the case with non CF formalisms, we 
will see in the next sections that a similar approach, 
when applied to LIGs, leads to a shared parse for- 
est which is a LIG while it is possible to define a 
derivation grammar which is CF. 

3 Linear Indexed Grammars 

An indexed grammar is a CFG in which stack of 
symbols are associated with non-terminals. LIGs are 
a restricted form of indexed grammars in which the 
dependence between stacks is such that at most one 
stack in the RHS of a production is related with the 
stack in its LHS. Other non-terminals are associated 
with independant stacks of bounded size. 



Following ( Vijay-Shankcr and Weir, 1994) 



Definition 2 L = (Vn,Vt,Vi, Pl, S) denotes a 
LIG where Vn, Vt, Vi and Pl are respectively fi- 
nite sets of non-terminals, terminals, stack symbols 
and productions, and S is the start symbol. 

In the sequel we will only consider a restricted 



if x = ax ... an, the states can be the integers . . . n, 
is the initial state, n the unique final state, and the 
transition function 5 is s.t. i G S(i — 1, a t ) and i G S(i, e). 



form of LIGs with productions of the form 

P L = {AO —> w}U {A(..a) -> TiB(..a')T 2 } 

where A, B G V N , w G Vf AO < \w\ <2,aa'e V? A 

< \aa'\ < 1 and r x r 2 G y T U{£}U{C() | C* G Vjy}. 
An element like A(..a) is a primary constituent 

while C() is a secondary constituent. The stack 
schema (..a) of a primary constituent matches all 
the stacks whose prefix (bottom) part is left unspec- 
ified and whose suffix (top) part is a; the stack of a 
secondary constituent is always empty. 

Such a form has been chosen both for complexity 
reasons and to decrease the number of cases we have 
to deal with. However, it is easy to see that this form 
of LIG constitutes a normal form. 

We use r() to denote a production in Pl, where 
the parentheses remind us that we are in a LIG! 

The CF-backbone of a LIG is the underlying CFG 
in which each production is a LIG production where 
the stack part of each constituent has been deleted, 
leaving only the non-terminal part. We will only 
consider LIGs such there is a bijection between its 
production set and the production set of its CF- 
backboncjj 

We call object the pair denoted A(pt) where A 
is a non-terminal and (a) a stack of symbols. Let 
Vo = {A(ot) | A G V N A a G Vj*} be the set of 
objects. We define on (VbUVr)* the binary relation 
derives denoted (the relation symbol is sometimes 

over lined by a production): 

1 iA(a a)l 2 I -T \B{a a )I 2I 2 



riA r 2 



L 



i>r 2 



In the first above element we say that the object 
B(a"a') is the distinguished child of A(a"a), and if 
riT2 = CQ, CQ is the secondary object. A deriva- 
tion Ti, . . . , Ti, r^+i, . . . , Ti is a sequence of strings 
where the relation derives holds between any two 
consecutive strings 

The language defined by a LIG L is the set: 

C(L) = {x I S() 4- xAx G Vf) 

As in the CF case we can talk of rightmost deriva- 
tions when the rightmost object is derived at each 
step. Of course, many other derivation strategies 
may be thought of. For our parsing algorithm, we 
need such a particular derives relation. Assume that 
at one step an object derives both a distinguished 



child and a secondary object. Our particular deriva- 
tion strategy is such that this distinguished child will 
always be derived after the secondary object (and its 
descendants), whether this secondary object lays to 
its left or to its right. This derives relation is denoted 
and is called L'nearn. 

£,L u 

A spine is the sequence of objects Ai(o>i) 
. . . Ai(cti) Ai + i(on+i) . . . A p (a p ) if, there is a deriva- 
tion in which each object Ai + i(oti+\) is the distin- 
guished child of Ai(a>i) (and therefore the distin- 
guished descendant of Aj[ctj), 1 < j < i). 

4 Linear Derivation Grammar 

For a given LIG L, consider a linear S*()/x-derivation 



so 



*x() 



rig 



The sequence of productions ri() . . .r<() . . .r n () 
(considered in reverse order) is a string in P£. The 
purpose of this section is to define the set of such 
strings as the language defined by some CFG. 

Associated with a LIG L = (V N , V T , Vi, P L , S), 
we first define a bunch of binary relations which are 
borrowed from ( Boullicr, 1995| ) 



1 

7 
-< 
1 

7 
>- 
1 



{{A,B)\A(..)^r 1 B(..)r 2 eP L } 

{(A,B)\A(..)^T 1 B(.. 1 )T 2 eP L } 
{(A,B)\A(.. 1 )^T 1 B(..)T 2 eP L } 
{(Ai,A p ) I Ai() 4 r x A p ()r 2 and A P 
is a distinguished descendant of Ai()} 



The 1-level relations simply indicate, for each pro- 
duction, which operation can be apply to the stack 
associated with the LHS non-terminal to get the 
stack associated with its distinguished child; -0>- in- 

7 7 
dicates equality, ~< the pushing of 7, and >- the pop- 
ping of 7. 

If we look at the evolution of a stack along 
a spine A 1 (otx) . . . Ai(ai)A i+1 (a i+ i) . . . A p (a p ), be- 
tween any two objects one of the following holds: 

an = Q!i+i, a^7 = a i+ i, or ctfj = ai+17. 

The -0- relation select pairs of non-terminals 
+ 

(Ai, A p ) s.t. u\ = a p = e along non trivial spines. 



3 r p and r p () with the same index p designate associ- 
ated productions. 



4 linear reminds us that we are in a LIG and relies 
upon a linear (tota l') order over o bject occurrences in 



a derivation. See (Boullier, 1996) for a more formal 
definition. 



7 7 7 

If the relations >- and « are defined as y=>- 
+ +1 

7 7 7 

U -0^>- and w= U-»ev ~*^> we can see * na * * ne 



+ l 



following identity holds 
Property 1 



= -<>- U w U -0--0- U «~>- 
+ l l + + 



In (Boullier, 1995) we can found an algorithm 5 



7 

which computes the >- and ~ relations as the 

+ + 

T 7 

composition of -< and >- in OflVWl ) time. 
11 l 

Definition 3 For a LIG L = (V N ,V T ,Vi,P L ,S), 
we call linear derivation grammar (LDG) the 
CFG Dl (or D when L is understood) D = 
(V£ ,Vf , P D , S D ) where 

• V° = {[A] | A G V N } U { [ApB] \A,BeV N A 

7 7 

p £ 1Z\ 7 and 1Z is the set of relations {-<, >- 



• V° = P L 



• s 



D 



[S] 



• Below, (Til^] denotes either the non-terminal 
symbol [X] when T1T2 — X() or the empty 
string e when Til^ 6 Vf. P D is defined as 
being 



{{A}^rO\rO = A()^weP L } (1) 
U{[A] ->r()[A-*-B] 

r() = BQ -» u> G Pl} (2) 

u{[A ^ c] ^ [r^K) 

r()=A(..)^r 1 C(..)r 2 GP i } (3) 
U{[A C] - [4 « C]} (4) 

U{L4^C] - [B-^-C][Tir 2 ]rO 

rO=4-)-»W a 6ft} (5) 
U{[A^C]^[B^C][AkB}} (6) 

u{[A«<7]^[s Jc][rir 2 ]r() 

r() = A(..) -+ riB(.. 7 )r 2 G (7) 



5 Though in the referred paper, these relations are de- 
fined on constituents, the algorithm also applies to non- 
terminals. 

6 In fact we will only use valid non-terminals [ApB] 
for which the relation p holds between A and B. 



U{[A >C]^ [T x T 2 ]r{) 

r() = A(.. 7 ) -> r x c(..)r 2 G P L } (8) 

U{[Ay-C] - [r!r 2 ]r()[A^P] 1 

r()=p(.. 7 )^r 1 c(..)r 2 GP i } (9) 

The productions in P-° define all the ways lin- 
ear derivations can be composed from linear sub- 
derivations. This compositions rely on one side upon 
property |l| (recall that the productions in Pi , must 
be produced in reverse order) and, on the other side, 
upon the order in which secondary spines (the Til^- 
spines) are processed to get the linear derivation or- 
der. 

In ( Boullier, 1996 ), we prove that LDGs are not 



ambiguous (in fact they arc SLR(l)) and define 



C(D) = {n()...r n ()|S() 
Ax G £(£)} 



LL 



If, by some classical algorithm, we remove from D 
all its useless symbols, we get a reduced CFG say 
D' = {V$',VP',P D ',S D '). In this grammar, all its 
terminal symbols, which are productions in L, are 
useful. By the way, the construction of D' solve the 
emptiness problem for LIGs: L specify the empty 
set iff the set Vf is empty[]. 

5 LIG parsing 

Given a LIG L= (V/v ,Vr,Vz, Pl, S) we want to find 
all the syntactic structures associated with an input 
string x G V£- In section || we used a CFG (the 
shared parse forest) for representing all parses in a 
CFG. In this section we will see how to build a CFG 
which represents all parses in a LIG. 

In ( Boullier, 1995 ) we give a recognizer for LIGs 
with the following scheme: in a first phase a general 
CF parsing algorithm, working on the CF-backbone 
builds a shared parse forest for a given input string x. 
In a second phase, the LIG conditions are checked on 
this forest. This checking can result in some subtree 
(production) deletions, namely the ones for which 
there is no valid symbol stack evaluation. If the re- 
sulting grammar is not empty, then x is a sentence. 
However, in the general case, this resulting gram- 
mar is not a shared parse forest for the initial LIG 
in the sense that the computation of stack of sym- 
bols along spines are not guaranteed to be consis- 
tent. Such invalid spines are not deleted during the 
check of the LIG conditions because they could be 



7 In ( Vijay-Shanker and Weir, 1992 ) the emptiness 
problem for LIGs is solved by constructing an FSA. 



composed of sub-spines which are themselves parts 
of other valid spines. One way to solve this problem 
is to unfold the shared parse forest and to extract 
individual parse trees. A parse tree is then kept iff 
the LIG conditions are valid on that tree. But such 
a method is not practical since the number of parse 
trees can be unbounded when the CF-backbone is 
cyclic. Even for non cyclic grammars, the number 
of parse trees can be exponential in the size of the 
input. Moreover, it is problematic that a worst case 
polynomial size structure could be reached by some 
sharing compatible both with the syntactic and the 
"semantic" features. 

However, we know that derivations in TAGs are 
context-free (see ( Vij ay- S hanker, 1987|) ) and ( Vijay 



Shankcr and Weir, 1993 ) exhibits a CFG which rep- 
resents all possible derivation sequences in a TAG. 
We will show that the analogous holds for LIGs and 
leads to an 0(n 6 ) time parsing algorithm. 

Definition 4 Let L = (V N ,V T , Vi,P L , S) be a LIG, 
G = (Vat,Vt, Pg-i S) its CF-backbone, x a string 
in C{G), and G x = (V£,\qt,P£,S x ) its shared 
parse forest for x. We define the LIGed forest 
for x as being the LIG L x = (Vjy, V£, Vi,P£, S x ) 
s.t. G x is its CF-backbone and its productions are 
the productions of Pq in which the corresponding 
stack- schemas of L have been added. For exam- 
pie r|() = [A}H-a) - [B]i(..a')[C]*Q G Pf iff 

r« = \A\\ - [B]>[C\I G Pg A r p = A -> BC G 
G A r p Q = A(..a) -> B(..a')C() e L. 



Between a LIG L and its LIGed forest L x for x, 
we have: 



x e C(L) 



x e C{L X ) 



If we follow( paiig7l994] ), the previous definition 
which produces a LIGed forest from any L and x 
is a (LIG) parser^ given a LIG L and a string x, 
we have constructed a new LIG L x for the intersec- 
tion C(L) (~l {x}, which is the shared forest for all 
parses of the sentences in the intersection. However, 
we wish to go one step further since the parsing (or 
even recognition) problem for LIGs cannot be triv- 
ially extracted from the LIGed forests. 

Our vision for the parsing of a string x with a LIG 
L can be summarized in few lines. Let G be the CF- 
backbone of L, we first build G x the CFG shared 
parse forest by any classical general CF parsing al- 
gorithm and then L x its LIGed forest. Afterwards, 
we build the reduced LDG associated with L x 
as shown in section 0. 



The recognition problem for (L, x) (i.e. is x an 
element of C(L)) is equivalent to the non-emptiness 
of the production set of Dl*. 

Moreover, each linear S'O/x-derivation in L is (the 
reverse of) a string in £(Dlx)[|]. So the extraction of 
individual parses in a LIG is merely reduced to the 
derivation of strings in a CFG. 

An important issue is about the complexity, in 
time and space, of D^. Let n be the length of 
the input string x. Since G is in binary form we 
know that the shared parse forest G x can be build 
in 0(n 3 ) time and the number of its productions 
is also in 0(n 3 ). Moreover, the cardinality of V§ 
is 0(n 2 ) and, for any given non-terminal, say [A]^, 
there are at most 0(n) [A] ^-productions. Of course, 
these complexities extend to the LIGed forest L x . 

We now look at the LDG complexity when the 
input LIG is a LIGed forest. In fact, we mainly have 
to check two forms of productions (see definition ||) . 
The first form is production (6) ([A C] -> [B -O- 

C][A w B]), where three different non-terminals in 

Vn are implied (i.e. A, B and C), so the number of 
productions of that form is cubic in the number of 
non-terminals and therefore is 0(n 6 ). 

In the second form (productions (5), (7) and (9)), 

exemplified by [A « C] -> [B y C][Y l T 2 ]r{), there 

are four non-terminals in Vjv (he. A, B, C, and X 
if r]T2 = XQ) and a production r() (the number 

7 

of relation symbols >- is a constant), therefore, the 

number of such productions seems to be of fourth 
degree in the number of non-terminals and linear in 
the number of productions. However, these variables 
are not independant. For a given A, the number of 
triples (B,X,rQ) is the number of A-productions 
hence 0(n). So, at the end, the number of produc- 
tions of that form is 0(n 5 ). 

We can easily check that the other form of pro- 
ductions have a lesser degree. 

Therefore, the number of productions is domi- 
nated by the first form and the size (and in fact 
the construction time) of this grammar is 0(n 6 ). 

This (once again) shows that the recognition and 
parsing problem for a LIG can be solved in 0(n e ) 
time. 

For a LDG D = (Vff, Vf , P D , S D ), we note that 
for any given non-terminal A G V® and string a G 
C(A) with |er| > 2, a single production A — > X\X 2 
or A — > X1A2A3 in P D is needed to "cut" a into two 
or three non-empty pieces a~x, o~2, and 173, such that 



Of course, instead of x, we can consider any FSA. 



In fact, the terminal symbols in Dl* are produc- 
tions in L x (say RiQ), which trivially can be mapped to 
productions in L (here r p Q). 



Xi => (J,;, except when the production form num- 
ber (4) is used. In such a case, this cutting needs 
two productions (namely (4) and (7)). This shows 
that the cutting out of any string of length I, into 
elementary pieces of length 1, is performed in using 
0(1) productions. Therefore, the extraction of a lin- 
ear iSQ/x-derivation in L is performed in time linear 
with the length of that derivation. If we assume that 
the CF-backbone G is non cyclic, the extraction of 
a parse is linear in n. Moreover, during an extrac- 
tion, since is not ambiguous, at some place, the 
choice of another A-production will result in a dif- 
ferent linear derivation. 

Of course, practical generations of LDGs must im- 
prove over a blind application of definition |^. One 
way is to consider a top-down strategy: the X- 
productions in a LDG are generated iff X is the start 
symbol or occurs in the RHS of an already generated 
production. The examples in section ^| are produced 
this way. 

If the number of ambiguities in the initial LIG is 
bounded, the size of Dl*, for a given input string x 
of length n, is linear in n. 

The size and the time needed to compute Dl* are 

7 

closely related to the actual sizes of the -<>-, y and 

« relations. As pointed out in ( Boullicr, 1995| ), their 

0(n 4 ) maximum sizes seem to be seldom reached in 
practice. This means that the average parsing time 
is much better than this 0(n 6 ) worst case. 

Moreover, our parsing schema allow to avoid some 
useless computations. Assume that the symbol 
[A -0- B] is useless in the LDG Dl associated with 

the initial LIG L, we know that any non-terminal 
s.t. [[A]* -v>- [B] l k ] is also useless in D L *. Therefore, 

the static computation of a reduced LDG for the 

7 

initial LIG L (and the corresponding -0-, y and rj 

relations) can be used to direct the parsing process 
and decrease the parsing time (see section |6|) . 

6 Two Examples 
6.1 First Example 

In this section, we illustrate our algorithm with a 
LIG L = ({S,T},{a,b,c},{~f a ,-f b ,j c },P L ,S) where 
Pl contains the following productions: 



production set Pq is: 

S — > Sa S — » 
T -^aT T — » 



Sb S 
bT T 



Sc S 

cT T -> c 



defines the language C(G) — {wcw 1 \ w,w' £ 
{a, b, c}*}. We remark that the stacks of symbols in 
L constrain the string w' to be equal to w and there- 
fore the language C(L) is {wcw | w G {a, 6, c}*}. 

We note that in L the key part is played by the 
middle c, introduced by production rs(), and that 
this grammar is non ambiguous, while in G the sym- 
bol c, introduced by the last production T — > c, is 
only a separator between w and w' and that this 
grammar is ambiguous (any occurrence of c may be 
this separator). 

The computation of the relations gives: 

= {(S,T)} 

= {(S,S)} 

= {(T,T)} 

= {(S,T)} 

= {(S,T)} 

= {(T,T),(S,T)} 

The production set P D of the LDG D associated 
with L is: 







-o- 






1 


la. 


7b 


7c 


-< 


= -< 


= -< 


1 


1 


1 


7a 


7b 


7c 


y 


= >- 


= >- 


1 


1 


1 






-0- 
+ 


7a 


7b 


7c 


>- 


= >- 


= >- 


+ 


+ 


+ 



[S] 

[S T] 

+ 

[S -O- T] 

+ 

[5«T] 

[S y T] 

+ 

[S y T] 

[SyT] 



r s ()[S T] 

[SyTjnO 
[SyT]r 2 Q 

[S y T]r 3 () 
r 5 ()[S T] 
r 6 ()[S^T] 
rr()[S ^ T] 



(2) 
(3) 
(4) 

(7) 

(7) 

(7) 

(9) 

(9) 

(9) 



The numbers (?) refer to definition ||. We can 
easily checked that this grammar is reduced. 

Let x — ccc be an input string. Since x is an 
element of £(G), its shared parse forest G x is not 
empty. Its production set Pq is: 



n() = S{..) -> S(.. la )a r 2 {) = S{..) -> S{.. lb )b 


4 = 


[S? - 




r 2 - 
r 4 — 




> mi 


r 3 () = S(..)^S(.. 7c )c r 4 ()=S(..)-T(..) 


4 = 


[S? - 


> [S\hc 


r 4 — 


[s\l- 


> mi 


rsQ = T(.. 7o ) - aT{..) r 6 () = T(.. Jb ) -» bT(..) 


r 4 — 


[S\h 


>[T]l 


r 7 — 


[T]l- 


>c[T]\ 


r 7 ()=T(.. 7c )^ c T(..) r s Q =TQ->c 


r 7 — 


[T]i- 




w.8 — 
r 8 — 


[T]l- 


* c 




rf = 


[T?o- 


>c[T]\ 




- mi - 


c 


It is easy to see that its CF-backbone G, whose 


r ii _ 
'8 ~ 


- mi - 


-> c 









We can observe that this shared parse forest denotes 
in fact three different parse trees. Each one corre- 
sponding to a different cutting out of x = wcw' (i.e. 
w = e and w' = cc, or w — c and to' = c, or w = cc 
and w' = e). 

The corresponding LIGed forest whose start sym- 
bol is S x = [S]q and production set P[ is: 



40 


= M-O 


- [S]§(-7c)c 


40 


= [S] 3 (4 


- [T}U4 


40 


= [s?o(..) 
= [S}U4 

= ism 


- [S]h(-lc)c 

- [T]g(..) 

- PISCO 


40 


40 


40 


- c[T]?(..) 


40 


= [T]?(..7c) 
= 

= [T]g(..7c) 


- c[T]l(..) 


40 




40 


- c[T]?(..) 


4°0 


— » c 


4H) 


= mho 


— > c 



For this LIGed forest the relations are: 

{([s]§,[T]g),([s]g,[r]g),([s]S,[T]5)} 

{([T]g,[T]?),([T]?,[T]I),([T]2,[T]?)} 



i 

7c 

-<; 
i 

7c 
>- 
1 



+ 

7c 



U « 

l 

7 >- c U{([^,[T]?),([^,[T]?)} 



The start symbol of the LDG associated with the 
LIGed forest L x is [[S*]o]. If we assume that an A- 
production is generated iff it is an [[SJ^J-production 
or A occurs in an already generated production, we 
get: 



[[S] 3 ] 



4°0[[S] 3 o -f 



[\S]l ^ [T]\] 

[\s]l 
[[s\l I 

i[S?o ^ [T]§] 



PI?] 



r?()[[S]§ [T]g] 



(2) 
(4) 

(7) 
(9) 
(3) 



This CFG is reduced. Since its production set is 
non empty, we have ccc £ £-{L). Its language is 
{ r 8°() r- 7() r 4() r 3()} which shows that the only linear 
derivation in L is SQ r 4 } S*( 7c )c r 4 } T( 7c )c r =4 

£ ,L £ ,L £,L 

cT()c r =4 ccc. 



In computing the relations for the initial LIG L, 

we remark that though TyT,TyT, and T £ T, 

+ + + 

the non-terminals [T £ T], [T y- T], and [T £ T] are 

not used in This means that for any LIGed for- 
est L x , the elements of the form ([T]«, [T]«',) do not 

7a 76 7c 

need to be computed in the >-,>-, and >- relations 

+ + ' + 
since they will never produce a useful non-terminal. 

7c 7c 

In this example, the subset >- of >- is useless. 

i + 

The next example shows the handling of a cyclic 
grammar. 

6.2 Second Example 

The following LIG L, where A is the start symbol: 

nO=A(..)^A(.. la ) r 2 () = A(..)-i?(..) 
r 3 () = B(./y„) - B{..) r 4 () = B() - a 

is cyclic (we have i 4 i and B 4 6 in its CF- 

backbone), and the stack schemas in production n() 
indicate that an unbounded number of push j a ac- 
tions can take place, while production r 3 () indicates 
an unbounded number of pops. Its CF-backbone is 
unbounded ambiguous though its language contains 
the single string a. 

The computation of the relations gives: 



l 

7a 
-< 
1 

7a 
>- 
1 

+ 



7a 



{(AS)} 

(CM)} 
{(AS)} 

{(AS)} 
P,B),(B,B)} 



The start symbol of the LDG associated with L is 
[A] and its productions set P D is: 



[A] 


- r 4 ()[A -*-!?] 


(2) 


[A ^ B] 


- r 2 () 


(3) 


[A B] 
+ 




(4) 


[A *B] 




(7) 




- r 3 ()L4^B] 


(9) 



We can easily checked that this grammar is re- 
duced. 

We want to parse the input string x — a (i.e. find 
all the linear S()/a-derivations). 



Its LIGed forest, whose start symbol is [A]q is: 



<1() = 
40 = 

40 = 



P]S(»7a) 

[B]hQ 



[A]5(..7a) 



For this LIGed forest L x , the relations are: 

{([A]l,[K)} 
{([A}l[B}l),([B}l[B}l)} 



-0- 
1 

7a 

-< 

1 

7a 

>- 

1 



-o- 

+ 



7a 

+ 



The start symbol of the LDG associated with L x 
is [[AjJ]. If we assume that an A-production is gen- 
erated iff it is an [[A] J]-production or A occurs in an 
already generated production, its production set is: 



U]h] 

PIS [ fl ]o 



ri()[[A]l ^ 



lo ^ [B}1] 



rlO 

MM 



[B]h] 



(2) 
(3) 
(4) 

(7) 

(9) 



This CFG is reduced. Since its production set 
is non empty, we have a € £-{L). Its language is 
K(){rt()} fe r§(){ri()} fc | o < k} which shows that 
the only valid linear derivations w.r.t. L must con- 
tain an identical number k of productions which 
push 7 a (i.e. the production ri()) and productions 
which pop 7 a (i.e. the production r^O). 

As in the previous example, we can see that the 

element [B]q >- [B]q is useless. 
7 Conclusion 

We have shown that the parses of a LIG can be rep- 
resented by a non ambiguous CFG. This represen- 
tation captures the fact that the values of a stack of 
symbols is well parenthesized. When a symbol 7 is 
pushed on a stack at a given index at some place, this 
very symbol must be popped some place else, and we 
know that such (recursive) pairing is the essence of 
context- freeness. 

In this approach, the number of productions and 
the construction time of this CFG is at worst 0(n 6 ), 



though much better results occur in practical situa- 
tions. Moreover, static computations on the initial 
LIG may decrease this practical complexity in avoid- 
ing useless computations. Each sentence in this CFG 
is a derivation of the given input string by the LIG, 
and is extracted in linear time. 
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